Paper2vec: Citation-Context Based Document Distributed Representation for Scholar Recommendation
نویسندگان
چکیده
Due to the availability of references of research papers and the rich information contained in papers, various citation analysis approaches have been proposed to identify similar documents for scholar recommendation. Despite of the success of previous approaches, they are, however, based on co-occurrence of items. Once there are no co-occurrence items available in documents, they will not work well. Inspired by distributed representations of words in the literature of natural language processing, we propose a novel approach to measuring the similarity of papers based on distributed representations learned from the citation context of papers. We view the set of papers as the vocabulary, define the weighted citation context of papers, and convert it to weight matrix similar to the word-word cooccurrence matrix in natural language processing. After that we explore a variant of matrix factorization approach to train distributed representations of papers on the matrix, and leverage the distributed representations to measure similarities of papers. In the experiment, we exhibit that our approach outperforms state-of-theart citation-based approaches by 25%, and better than other distributed representation based methods. Preprint submitted to Neurocomputing March 21, 2017 ar X iv :1 70 3. 06 58 7v 1 [ cs .I R ] 2 0 M ar 2 01 7 WoS Scopus CiteSeerX DBLP PMC arXiv Full text availability No No Yes Yes Yes Yes Records in millions ∼ 90 ∼ 55 ∼ 6 ∼ 3 ∼ 3 ∼ 1 Table 1: List of some popular datasets. Citation index often contains much more records than full-text dataset.
منابع مشابه
Paper2vec: Combining Graph and Text Information for Scientific Paper Representation
We present Paper2vec, a novel neural network embedding based approach for creating scientific paper representations which make use of both textual and graph-based information. An academic citation network can be viewed as a graph where individual nodes contain rich textual information. With the current trend of open-access to most scientific literature, we presume that this full text of a scien...
متن کاملنقش ارتباطات معنایی در بهبود نتایج یک سیستم پیشنهاد استناد- مقاله برگزیده هفدهمین کنفرانس ملی انجمن کامپیوتر ایران
With the increasingly growth of scientific documents in the Web, it is difficult to select a concerned document. A citation recommendation system receives a text and recommends documents to be cited by the text. Such recommendation helps a researcher in hitting his/her concerned texts. Based on sematic relations, this paper presents a new indicator to measure the similarity between documents an...
متن کاملCitation Resolution: A method for evaluating context-based citation recommendation systems
Wouldn’t it be helpful if your text editor automatically suggested papers that are relevant to your research? Wouldn’t it be even better if those suggestions were contextually relevant? In this paper we name a system that would accomplish this a context-based citation recommendation (CBCR) system. We specifically present Citation Resolution, a method for the evaluation of CBCR systems which exc...
متن کاملContextual Recommendation with Path Constrained Random Walks
Recommendation has become an increasingly important research area because of the abundance of online documents and shopping opportunities. The readily available context information for these tasks posts challenges to recommendation systems to deal with heterogeneous and inter-connected problem spaces. In this study, we use a graph representation for publication databases with rich meta-data. Wi...
متن کاملPersonalized Reading Recommendations for Saccharomyces Genome Database
The rapid growth of research in biology, and the increasing degree to which different subareas of biology are connected, make it difficult to monitor the published literature effectively. To address this problem, we develop a reading recommendation system that requires no other input from users except their reading or citation history. This frees the users from the problem of expressing their i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1703.06587 شماره
صفحات -
تاریخ انتشار 2017